User guide of GranulatShiny

Authors : Aurel Hebert–Burggraeve, Laure Simplet, Laurent Dubroca, Camille Vogel

Homepage


GranulatShiny is an application that facilitates the statistical processing of data collected as part of the initial studies, pre-construction baseline studies and environmental monitoring dedicated to fishery resources and ichthyofauna relating to the appraisal of applications for authorisation to extract marine aggregates. The application automates some of the data formatting and the calculation of standard biodiversity indicators, and provides decision keys for the more advanced processing stages.
Based on the calculated indicators and the user’s choices, the application produces figures and tables in formats corresponding to the recommendations of the french reference documents: the “Halieutic Protocol” and the methodological guide for the development of Orientation Documents for Sustainable Management of Marine Aggregates (DOGGM). The application provides an interactive graphical interface based on the R language, relieving the user of the need for mastery to focus on parameters of interest for diagnosing the potential effects of marine aggregate extraction on fish resources. GranulatShiny consists of 3 statistical approaches (exploratory, descriptive, and inferential) that can be used primarily to quantify the influence of marine aggregate extraction on fish communities. Each approach covers an expected aspect of the “Halieutic Protocol”. The exploratory approach presents and analyzes data at the scale of the entire community. The descriptive approach presents and analyzes data at the scale of a species. And resorting to inferential analysis is necessary to evaluate the temporal and spatial variability of different indicators of fish resources before and during exploitation.
The current version of the application does not have a regulatory purpose but serves as an aid in the production of monitoring reports on fish communities. It does not replace the work already provided by consulting firms but complements what is already done by allowing the assembly of a statistical model to test the effects of certain parameters on fish communities.

This guide has been written to enable a GranulatShiny user to familiarize themselves with the application interface and understand the methodology developed behind each result provided by the application.
Before starting, note that there are different buttons in the application.

The buttons with a boat icon allow you to switch from one tab to another.

Those bearing a small green dragon indicate a mandatory stopover.

The buttons with an arrow allow you to download results from the application. The formats used are (csv, png, txt, rds).

Finally, those with a circle containing an i are helps that can be displayed to help you better understand a graph or other objects proposed by the application.

When launching the application, the homepage opens automatically. On this page, various reference documents and information are listed with their associated URLs hyperlinked. A reminder of the context regarding marine aggregate extraction is located on the right side of the page.
To move to the next tab, press the “start” button on the homepage within the application.

Data formatting

 Information to enter: loading datasets to analyze.

The first step in using the application is to import the data collected. The data must comply with a standard defined by Ifremer, the outline of which can be found here:
https://raw.githack.com/GranulatShiny/GranulatShiny/main/Description_Format_Generique_GranulatShiny.html

There are three possible scenarios:

- Case 1: You are new to the tool and have no formatted data with which to test its functionality. In this case, you will use the dataset supplied with the tool
If this is an exercise in discovering the tool, a fictitious dataset is made available to the user along with the tool. The data is directly integrated into the application and can be loaded by selecting the “Non” answer under the heading: “Do you have your own data? This dataset is intended for educational purposes. It does not correspond to any real case and cannot therefore appear in documents with administrative value (i.e. monitoring reports, initial environmental status, reference status before works, etc.).

For the purposes of familiarisation with the tool, the dataset made available with the tool corresponds to a fictitious concession located in the Bay of Biscay. For this fictitious dataset, we consider a concession in operation from 2000 to 2030, for which monitoring of the fisheries compartment has been set up every 5 years with 2 years of initial status. The fictitious sampling plan provides for the sampling of 10 stations within the concession and 10 stations outside the concession. This choice does not correspond to a sampling plan that would have been developed with knowledge of the environmental conditions of the site (i.e. sedimentary facies, benthic habitats) and therefore does not correspond to the recommendations of the “Halieutic Protocol”. This fictitious sampling plan uses a beam trawl with a horizontal opening of 4.4 m and a tow length of 1000 metres.
To avoid any confusion, the 4 species present in this dataset are also fictitious. Each species has a population dynamic associated with a specific probability distribution law. Knowing this, it is possible to control the results from inferential statistics and the effects of the environment on the chosen species. The first species, Cephalaspis.tenuicornis, was not affected by the extraction and its spatio-temporal dynamics were stable throughout the monitoring period (i.e. there was no effect of time, space or environmental conditions on the observed abundances). Thus no potential effect of the variables on the abundance of this species will be detected. The Dimichtys.terreli species is impacted by extraction, but its spatio-temporal dynamics are stable throughout the monitoring period. There is therefore a significant effect of extraction, which is reflected in the difference between the values obtained by sampling in or outside the exploitation zone. Leedsischthys.problematicus is affected by extraction, but differently depending on the season. The seasons do not influence the population of this species in normal times, i.e. in the absence of aggregate extraction, but the interaction between the effect of extraction and the seasonal effect modifies the abundance of this species. Finally, Latimeria.chalumnae is not affected by the effect of extraction, nor by spatio-temporal effects such as season and/or environmental conditions, but the abundance of the species is naturally highly variable. These examples illustrate different responses to the environment in order to better understand what is sought during inferential analysis.

- Case 2: you have some initial experience of the tool and are starting to analyse your own dataset in the recommended format
If you are analysing real data in the appropriate format, you need to select and load the following files into the GranulatShiny graphical interface: “TuttiCatch.csv” and “TuttiOperation.csv”, which contain most of the information relating to the progress and results of the monitoring carried out. Only the csv format is supported. The “TuttiCatch.csv” file corresponds to the catch data from the sampling of fish populations and the “TuttiOperation.csv” file corresponds to all the information derived from the implementation of the protocol for each sampling station (i.e. date and name of the survey, fishing gear, the characteristics of which will be specified in the reports associated with the results, geographical coordinates of the spinning and turning points and associated times, total duration of the trawl haul, turning and spinning depth).
WARNING. The expected data format must be respected, otherwise the processing routines cannot run correctly and a warning message will appear on the interface. In this case, it is recommended that you review your file format with the expected file format. Furthermore, in the case of sub-sampling or information provided at individual level, it is important to report the information at tow scale so that each combination of species and tow corresponds to a single line in the data table. Otherwise a message will be returned indicating the existence of duplicates preventing the catch data from being processed.

Once the files have been loaded, you will have access to a number of new functions. A map centred on the concession will appear, displaying the sampling stations. You will also be able to interact with the ‘Impact stations’ and ‘Reference stations’ fields. You will also be able to import ‘ShapeFiles’ to display the contours of the marine aggregate extraction concession.
Under the heading “Impact stations”, you can check and modify the operating period (the period during which extraction work takes place). You must also enter in the corresponding space the stations that are impacted by the extraction. The colour of the various samples will then change to red for the stations affected (see figure below).
In accordance with paragraph 8.3.2 of the “Halieutic Protocol”, the application is developed for the most common case of trawl sampling. The horizontal opening length of the trawl must be entered to calculate the sampled areas in order to work in density. At this stage of development, the application does not take into account other fishing gears.
Finally, in the exceptional case where a station entered in the “TuttiOperation.csv” file needs to be removed after the fact, this can be done under the “Reference stations” heading.

- Case 3: You have already used the tool to process your data. You have a summary file of all the parameters used for a previous analysis and you want to start again from this file.
If you have already saved the settings in a file, you can import them after the “TuttiCatch.csv” and “TuttiOperation.csv” files, so that the station fields are filled in automatically.

Production of indicator tables

When you have completed the data loading stage, you can press the button with the green dragon. This will launch the internal calculation of the various indicators and covariates required to analyse the data. If you do not press this button, nothing will happen and you will not be able to continue with the analysis.

Nota Bene : If you have more than one concession to analyse, you can return to this tab, change the files by loading those corresponding to this other concession (“TuttiCatch.csv”, “TuttiOperation.csv”, “ShapeFiles”), then press the green dragon again to restart production of the indicator tables.

General table

In the “Tables” tab, there is a data table on the right and an interactive section on the left. The table displayed is formed from the data entered in the ‘Data formatting’ tab. The table formatting functions will calculate abundance, biomass and various diversity indicators for each station and for each survey. The “treatment” variable, which indicates the state of each station, can take two values: “no impact” or “impact”. It indicates whether the station is within the perimeter of the concession and therefore considered to be impacted by the aggregate extraction work (i.e. “impact” mode) or whether the station is outside the perimeter of the concession (i.e. “no impact” mode). In the case of an initial state, where there has been no extraction on the site of the concession studied, the stations located inside the concession are assigned the “no impact” state until the start date of exploitation. This allows them to be considered as reflecting the state of the environment before any impact from extraction, for the purposes of the statistical analysis carried out afterwards.

During the formatting processes, the season mode is calculated on the basis of the sampling start dates. The administrative framework is chosen by default to determine the seasons. However, it is possible to change this column in the full table. Particular attention is paid to the notion of season, as this is an integral part of the assessment of the temporal variability of fish communities (paragraph 8.1.4 of the Halieutic Protocol). According to the “Halieutic Protocol”, the effects of seasonal variability on fish assemblages (groups of species) are highly dependent on latitude. In northern waters (North Sea, English Channel, northern Bay of Biscay), it is common to observe only two types of fish assemblages per year, a winter assemblage for about eight months of the year and a summer assemblage for about four months. In the warmer waters of the south (south of the Bay of Biscay, Mediterranean), seasonal assemblages are potentially more numerous, with more marked spring and autumn assemblages. Nevertheless, it is the conduct of the initial survey that will make it possible to determine the seasonal variability locally and to decide on the seasonal periodicity of the monitoring surveys. By modifying the “season” column in the table, it is possible to adjust to local conditions and sampling difficulties.

You can change the general display of the data table using the arrow under the “which table to display” message, and you can download the table displayed using the “Download table” button. The “Download entered information” button is used to save the list of impact stations, the exploitation dates and the trawl opening width used in the “Import data” tab in a csv file. At the end of this tab, you can decide whether to carry out the “exploratory statistics” section, which looks at the community as a whole, or to go straight to the “descriptive statistics” section, which focuses on a specific variable.

Exploratory statistics

Representation of indicators

This section looks at the biodiversity and abundance indicators obtained at the community level, inside and outside the concession, for each data collection survey carried out. The indicators presented are those referred to in the “Halieutic Protocol”, article 8.4.1.

  Definition of indicators

Biodiversity encompasses the variety of life at all levels of organisation, classified according to evolutionary (phylogenetic) and ecological (functional) criteria. At the level of biological populations, genetic variation between individual organisms and between lineages contributes to biodiversity as a signature of evolutionary and ecological history and a basis for future adaptive evolution. It is at the species level that the term biodiversity is most often applied by ecologists and conservation biologists. Species richness refers to the total number of species present in a given ecosystem. It is a simple measure that only takes into account the number of species without considering their relative abundance. For example, if a tropical forest contains 100 different tree species, its species richness would be 100.

A diversity index is a mathematical expression that combines species richness and evenness to measure diversity. The main objective of a diversity index is to obtain a quantitative estimate of biological variability that can be used to compare biological entities in space or time. This index takes into account two different aspects that contribute to the concept of diversity in a community: species richness and homogeneity.

The Shannon-Weaver diversity index is a widely used index for comparing diversity between different habitats. It assumes that individuals are randomly sampled from a large independent population and that all species are represented in the sample. This index measures both species richness and the equity (or uniformity) of species distribution in an ecosystem. It takes into account both the number of species present and their relative abundance. More specifically, the Shannon-Weiner index is calculated using the following formula:

\[ H′= -\sum_{i}^S (p_{i}*ln(p_{i})) \]

where :
S is the total number of species,
pi is the proportion of the i-th species among all the species present,
ln represents the neperian logarithm

The value of the Shannon-Weaver diversity index is generally between 1.5 and 3.5 and rarely exceeds 4.5. A higher Shannon-Weiner index indicates a greater diversity of species and a more uniform distribution between these species.

Unlike the Shannon-Weiner index, the Simpson index focuses primarily on the dominance of the most abundant species in an ecosystem. It is calculated using the following formula:

\[ D= \sum_{i}^S (p_{i}*(p_{i}-1)) \]

where the terms are the same as in the Shannon-Weiner index. A higher Simpson index indicates lower biodiversity, because it places greater emphasis on the probability that a species chosen at random is the same as the one chosen previously.

  Representation in the application

Firstly, the table (below) shows the mean values for abundance, biomass, species richness, Shannon and Simpson indicators inside the concession, outside the concession and overall for each survey. They are calculated from the values obtained at each sampling station. For easier reading, the standard deviations are not displayed in the table in the application but are available in the cvs file that can be downloaded via the “Download table” button.

The graphs below show the mean values (dots) and the 5 and 95 percentiles (high and low bars) obtained for the same indicators as those in the table, depending on the survey selected and the sector sampled (paragraph 8.4.1 of the Halieutic Protocol). They provide a quick overview of the differences in values obtained between the concession area and the reference area for the most common biodiversity indicators.


The advantage of these approaches is that the fish community can be compared on several scales. Initially, the comparison focuses on the inside or outside of the concession. But if the surveys are looked at one after the other, it may be possible to distinguish changes over time. There is both a spatial and a temporal aspect.

Representation of the structure

This table represents the proportion of each species present for each sampling survey (paragraph 8.4.1 of the Halieutic Protocol). The table makes it possible to monitor changes in the proportions of species over time and provides a perspective on trends in assemblages. Variations in the proportions of different species from one year to the next can indicate significant ecological changes, such as fluctuations in biodiversity, changes in habitats or environmental pressures. This table can also be used to identify which species are dominant in a given ecosystem and which are in decline.

The figure below is made up of three graphs representing the abundance of species in a given survey in descending order (left), the relative contribution of each species to the total abundance (top right) and the species accumulation curve (bottom right). It provides information to meet the expectations of paragraph 8.4.1 of the ” Halieutic Protocol “.

The figure on the left shows the abundance of each species for a survey in a histogram ordered by decreasing abundance. This makes it possible to identify the dominant species within a survey, i.e. those that are most abundant in the sample. These dominant species can play a crucial role in structuring the ecosystem studied, influencing, for example, competition for resources or predation on other species.
In addition, by observing changes from one survey to the next, this histogram can help to visualise temporal trends. For example, an increase or decrease in the abundance of a dominant species could indicate changes in environmental conditions or in the interactions between species. It can also be used to detect seasonal and annual variations, such as breeding peaks or seasonal migrations, which can influence the composition of the community.
Comparing data between different surveys is also made easier by this histogram. By placing species abundance distributions for different periods side by side, it can help to identify similarities and differences between ecosystems at different times of the year. This comparison can reveal general ecological patterns or specific responses to environmental disturbances.
For ease of reading, species accounting for less than 1% of total abundance are not shown on the histogram.

The figure in the top right corner is a curve of cumulative abundance as a function of the number of species in descending order of abundance. Cumulative abundance refers to the cumulative sum of species abundances in a dataset, starting with the most abundant species and successively adding the abundances of the following species in descending order. The cumulative abundance curve allows for the assessment of species diversity and distribution in an ecosystem or biological sample. The straighter the curve, the more diversified the community, whereas a curve that rises quickly then flattens indicates a community where a few species are very abundant while most species are rare. It’s important to note that the cumulative abundance curve is of interest when there are many different species. Here, the curve is constructed from the dataset of a fictitious concession with only 4 species. Therefore, it has limited relevance. In practice, you should not have this kind of result with your data.

The figure in the bottom right corner represents the number of species as a function of the number of sampled sites. The algorithm, for a given number of sites, will test all existing combinations in the dataset and retrieve the number of species for each combination. Then it calculates an average per number of sites, and it’s this average value that is plotted on the graph.
The shape of the species accumulation curve can provide information about the diversity of the studied ecosystem. If the curve increases rapidly and tends towards a flat asymptote, it suggests that most of the present species have been sampled, providing a good estimate of ecosystem diversity. Conversely, if the curve increases slowly and does not seem to reach a plateau, it indicates that there are still species to be discovered, and sampling should be continued to obtain a more accurate estimate of diversity.
The species accumulation curve is useful for determining the minimum number of samples needed to obtain an adequate representation of species diversity. This can be used during the initial assessment phase to ensure that the proposed sampling plan captures the diversity of the area. Here, constructed from the fictitious dataset, the curve holds little interest.

Descriptive statistics

Data representation

In this section, we focus on a specific indicator (species abundance, total biomass, diversity index, etc.) and compare it to the explanatory variables in our dataset. We are looking for potential effects or correlations upstream of inferential statistics. Therefore, we need to select a variable that we aim to explain based on parameters related to data acquisition and extraction. The statistical analysis will be conducted on this variable.
Initially, the table summarizing the explained variable provides information on the number of zeros and missing values, the total length of the value series, and the fraction of zeros and missing values compared to the total values. Additionally, it also gives the mean, extremes, standard deviation, and quartiles of the series. The most important values for the “inferential statistics” part are those indicating the number of zeros and missing values (“n_missing”, “complete_rate”). The proportion of zeros in the data structures the modeling method to be employed. Indeed, a high proportion of zeros can compromise the implementation of a generalized linear mixed model.

Following this table, it’s possible to visualize boxplots. The boxplot provides another representation to interpret the relationship between the explained variable and the explanatory variables such as impact, year, survey, station, and season. In graphical representations of statistical data, the boxplot is a quick way to illustrate the essential profile of a quantitative statistical series. The boxplot summarizes some position indicators of the studied characteristic (median, quartiles, minimum, maximum, or deciles). It is often used to quickly compare two series. In GranulatShiny, the series of the explained variable (here abundance) in the impacted zone is compared with that of the non-impacted zone. In the figure below, the same variable is represented on a decimal scale (on the left) and on a logarithmic scale (on the right). The logarithmic scale is offered in the application to transform the variable into a pseudo-normal distribution and provide more meaning to the boxplot representation by limiting the influence of extreme values. Each boxplot is constructed as follows: the horizontal line crossing the white square corresponds to the median, the upper and lower edges of the white square correspond to the 75th and 25th quartiles respectively, the ends of the “whiskers” correspond to the 95th and 5th percentiles; finally, the points correspond to extreme values.

In paragraph 8.4.1 of the “Halieutic Protocol”, it is indicated that the data should be described and analysed by size group, maturity or functional group. The GranulatShiny application does not allow data tables to be subdivided into subgroups. This must be done upstream by the user. To analyse a particular functional group, you need to sort your “TuttiCatch” file and save this new table so that you can integrate it into the application. This enables the GranulatShiny statistical method to be applied to the species to be treated separately.

Once you have explored the data, you can move on to the next tab by pressing the “Choose distribution probability” button or by clicking on “Diagnosis of analysis”.

Diagnostic analysis

This tab allows you to select and visualize the probability distribution that best fits the explained variable. The frequency histogram of the variable under study (selected by the user) is represented by gray bars. It depicts the empirical distribution of the observed data. It’s constructed by grouping the data of the variable into intervals and counting the number of observations in each interval. This provides a visualization of the distribution of the variable’s values.
The density function (in blue in the example) is an estimation of the probability distribution of the variable’s data. It’s calculated by fitting different statistical distribution models to the observed data. These models may include normal, Poisson, exponential distributions, etc. The density function represents the probability that an observation falls within a particular range of values.
The probability distribution (in green in the example) represents the statistical distribution model that would best match the density function. It’s chosen by the user to best overlay with the blue curve. The parameters of each probability distribution are approximated using the mean and standard deviation of the variable.

By examining these three elements together, you can visually assess how well the adjusted probability distribution model fits the observed data. A good match between the three indicates that the model is a precise representation of the data distribution. However, significant discrepancies may indicate inadequacies in the chosen model or particular characteristics of the data that require further analysis. You can change the type of probability distribution to test which one seems to fit best. If the chosen distribution does not fit at all, a warning message appears. In the example, abundance is represented, and the chosen law is a Lognormal distribution.

Once you are satisfied with the probability distribution, check the sentence above the “Modeling” button. There are two possibilities. If you have fewer than 30 observations, the sentence says: “You don’t have enough values to build a GLM or a GLMM” In this case, you need to change the working variable because there are not enough values to create a relevant model. Conversely, if the volume of data present in the dataset is sufficient (more than 30 observations), you will see: “Once you have chosen a distribution you can move on to building the model” When you are done, press the “Modelling” button.

Modelling

Model creation

This section is devoted to creating a model for inferential analysis. The variable being analysed is shown at the top left of the tab and can be modified in the “data representation” tab. The application allows you to perform 3 types of inferential tests: GLMM, GLM and PERMANOVA. This chapter begins with a reminder of the general statistical principles used in the application.

  General principles of statistics needed to use the tool

    Parametric and non-parametric methods

The field of statistics exists because it is impossible to collect data from all individuals concerned with the subject of a given question (population). The only solution is to collect data from a subset of the concerned individuals (sample), but the true objective is to understand the “truth” about the population in a statistical sense. The population is thus approximated by studying descriptive variables of its characteristics. Each studied descriptive variable is a statistical object, which can be described by indicators. Statistical indicators such as the mean, standard deviation, and quartiles are used to summarize the information about an observed variable. When studying a sample considered representative, these indicators are used to construct the distribution law of the studied variable. Each indicator corresponds to a parameter of this distribution law. It is then assumed that the distribution law obtained for this variable from the sample is applicable to the population. Therefore, a statistic estimates a parameter.

Parametric statistical methods, such as linear models or generalized linear models, assume that the data follow a specific distribution (i.e., normal distribution) in the underlying population. Additionally, they often assume that certain parameters of this distribution, such as the mean and standard deviation, are known or can be estimated from the data. Unlike parametric methods, non-parametric statistical methods do not impose specific assumptions on the shape of the data distribution.

    Linear models

A classical linear model allows for studying the statistical relationship between a response variable Y and the explanatory variables X. Let yi be the response of individual i and xi the values taken by the explanatory variables for this individual. The relationship between X, the matrix of explanatory variables, and Y, the response vector, can be written as: \[Y = α + β.X + ε \text{ }\text{ }\text{ avec } ε \sim N(0,\sigma)\] where ε represents the model’s residuals. They follow a centered normal distribution with homogeneous variance and are independent of the explanatory variables. The term α corresponds to the intercept, and β is a vector composed of the estimated coefficients of the model for each of the explanatory variables making up the matrix X. The response variable for a linear model should be approximately normally distributed.

    Generalized linear models

Linear models have wide application but cannot handle discrete or asymmetric response variables, which violate the assumptions of this analytical framework. For example, count response variables generally follow a Poisson distribution, while binary variables such as presence/absence are associated with a binomial distribution. Generalized Linear Models (GLM) extend the methodological framework of linear modelling to a wider class of response types, such as those listed above.

An important thing to understand in GLMs is the relationship between the values of the response variable Y (as measured in the data and predicted by the model in the fitted values) and the linear predictor. The linear predictor emerges from the linear model as a sum of each term in the model. The linear predictor corresponds to the Y variable only in the case of a classical linear model following a normal distribution. In the case of a generalised linear model, it is the link function, g, which links the value Y to its linear predictor N. We can therefor write the associated statistic model as follows \[ N =g(Y)= α + β.X + ε\] \[\text{et } Y=g^{-1}(N) \] \[\text{avec } ε \sim N(0,\sigma)\] The value of N is obtained by transforming the value of Y by the link function g, and the predicted value of Y is obtained by applying the inverse link function, g-1 to N.
By using different distribution laws and therefore different link functions, it is possible to observe the consequences on the distribution of the model’s residuals. The most appropriate link function is the one that produces residuals most consistent with the assumptions of the linear model.

    Generalized linear mixed model

Generalized linear mixed models (GLMM) are an extension of GLMs. A GLMM is called “mixed” when it includes at least one “fixed” effect and at least one “random” effect. Fixed effects correspond to the explanatory variables whose effect on the response variable is expected and for which the user wishes to estimate an associated parameter (βi, the size of the effect of explanatory variable i). Random effects are not evaluated; they are only used to indicate to the model that the data are not independent and reflect a correlation between the statistical units. From a statistical point of view, incorporating random effects improves the estimation of the model’s residual deviance by structuring the error term (ε). In doing so, it limits biases in the estimation of the model’s parameters (β) and their standard errors. Ultimately, this results in parameter values closer to reality and more reliable p-values.

    Analysis of variance using permutations

PERMANOVA, or Permutational Multivariate Analysis of Variance, is a statistical method used to analyze differences between several groups defined by qualitative characteristics, such as different treatments in an experimental study.

Unlike other statistical methods that require certain assumptions about the distribution of residuals when constructing a model, PERMANOVA is a non-parametric method. It focuses on a distance matrix between the studied elements. This method allows for working with one or more response variables and examining the effects of one or more categorical variables on the data.

The goal of PERMANOVA is to determine if there are significant differences in variability between groups, without quantifying this variability. To achieve this, it evaluates the variation between groups (inter-group variance, denoted as SS inter in R outputs for “sum of squares inter-groups,” the sum of squares of deviations from the mean of each group relative to the mean of the groups) against the variation within groups (intra-group variance, denoted as SS intra in R outputs for “sum of squares intra-groups,” the sum of squares of deviations from the mean of each observation relative to the mean of the group it belongs to). A high SS inter value suggests significant differences between the group means. Conversely, a low SS intra value indicates increased similarity of observations within each group.

PERMANOVA uses random permutation of categories or sample groups to assess whether the observed differences between groups are statistically significant. First, a measure of the difference between groups is calculated from the original data. Then, the sample labels are randomly shuffled, and this measure is recalculated to see if the differences persist even after shuffling. This process is repeated many times to create a distribution of this measure under the null hypothesis (no difference between groups). By comparing the measure obtained from the original data to this distribution, PERMANOVA determines if the observed differences are likely real or simply due to chance.

However, PERMANOVA has some limitations. It cannot determine which specific group differs from the others, only that at least one group is different. Additionally, the presence of zero values can bias the estimation of similarity between elements, which is particularly problematic in ecology where a zero may indicate the absence of a species. This limitation can be mitigated by choosing an appropriate association coefficient in the calculation of the distance matrix.

  Writing the model with GranulatShiny

In GranulatShiny, the models are centered on the treatment variable because the environmental monitoring of marine aggregate extraction concessions aims to characterize the impact of the activity on various compartments of the marine environment according to the BACI (Before After Control Impact) approach. By definition, the BACI approach compares control (i.e., non-impacted) sampling stations with impacted sampling stations and tests for differences before and after the introduction of the considered anthropogenic disturbance (in this case, marine aggregate extraction). This approach is commonly used in oceanic environmental monitoring, and a well-designed BACI approach remains one of the best methods for environmental impact monitoring programs. Unfortunately, this method has several limitations that compromise its ability to detect effects, particularly because the ocean is spatially and temporally dynamic, and finding two locations that are statistically identical to each other while being geographically far enough apart to be statistically independent is a real challenge.

Depending on the modeling method you choose, the model formulation differs. The GLMM will take into account two fixed explanatory variables, treatment and season, and their interaction, as well as two random explanatory variables, survey and station. If we take abundance as the response variable, the model formulation in R and thus in the GranulatShiny interface is as follows: \[Abun∼traitement∗saison+(1∣campagne)+(1∣station)\] The GLM will take into account only the fixed explanatory variables considered in the GLMM and their interaction: \[Abun∼traitement∗saison\] The PERMANOVA will take into account the same explanatory variables as the GLM: \[Abun∼traitement∗saison\] The treatment variable is used to analyze the impact of aggregate extraction on fish populations. The season variable allows for accounting for the seasonality of the data. The random effect station accounts for the spatial variability of the environment. The random effect survey accounts for the temporal variability of the environment.

For a GLMM and a GLM, you will need to choose a probability distribution. By default, the interface proposes the last probability distribution you used in the previous section. Note that the method used for modeling is iterative, meaning that the final result is obtained through successive adjustments. Therefore, the distribution that seemed most appropriate in the previous section may not necessarily be the one that allows the model to converge best. However, the “analysis diagnostics” tab should have helped narrow down the possible distributions to a limited number so that you don’t have to test all of them here.

You can also choose to keep or omit the interaction between the covariates treatment and season. Note: if the interaction does not add anything to the model, it will be automatically removed. You can also add other covariates to your model. They will be added without interaction with the others. When you are ready, you can click on “start modeling”.

    Generalized linear mixed model

Paragraph 8.4.2 of the “Fisheries Protocol”:
“Using inferential analysis is necessary to evaluate the temporal and spatial variability of different indicators of fishery resources before extraction: employing generalized linear mixed models (GLMMs) with temporal and spatial variables defined as crossed random effects and a fixed seasonal effect.”

The GLMM is the preferred method. The output called “summary” in the interface reproduces that of the R software for the corresponding command line. From the dropdown menu, you can choose to display the analysis of variance table, which summarizes the modeling results by providing only a summary of the evaluation of the importance of the fixed effects in the model and understanding their impact on the response variable. You can also display the comprehensive summary of these results. You can choose to display the results of the model before optimization via the initial choice or the optimized model via the final choice at the bottom left.

Reproduction of the R output of the GLMM model on abundance.

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: gaussian  ( identity )
## Formula: log(Abun) ~ traitement * saison + (1 | campagne) + (1 | station)
##    Data: dataset
## 
##      AIC      BIC   logLik deviance df.resid 
##   1676.1   1726.4   -827.0   1654.1      708 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1925 -0.6111  0.0535  0.5899  3.5320 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  campagne (Intercept) 0.010794 0.10390 
##  station  (Intercept) 0.004959 0.07042 
##  Residual             0.573266 0.75714 
## Number of obs: 719, groups:  campagne, 36; station, 20
## 
## Fixed effects:
##                                    Estimate Std. Error t value Pr(>|z|)    
## (Intercept)                        10.68015    0.10461 102.092  < 2e-16 ***
## traitementSans impact               0.11829    0.12315   0.961  0.33679    
## saisonSpring                        0.01607    0.14258   0.113  0.91025    
## saisonSummer                       -0.15548    0.14264  -1.090  0.27571    
## saisonAutumn                       -0.04808    0.14261  -0.337  0.73603    
## traitementSans impact:saisonSpring  0.46811    0.16796   2.787  0.00532 ** 
## traitementSans impact:saisonSummer  1.05791    0.16815   6.291 3.14e-10 ***
## traitementSans impact:saisonAutumn  0.44990    0.16803   2.678  0.00742 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##                      (Intr) trtmSi ssnSpr ssnSmm ssnAtm trtmntSnsimpct:ssnSp
## trtmntSnsim          -0.719                                                 
## saisonSprng          -0.681  0.491                                          
## saisonSummr          -0.681  0.491  0.500                                   
## saisonAutmn          -0.681  0.490  0.500  0.499                            
## trtmntSnsimpct:ssnSp  0.491 -0.682 -0.720 -0.360 -0.360                     
## trtmntSnsimpct:ssnSm  0.491 -0.682 -0.359 -0.720 -0.359  0.499              
## trtmntSim:A           0.490 -0.681 -0.360 -0.359 -0.720  0.500              
##                      trtmntSnsimpct:ssnSm
## trtmntSnsim                              
## saisonSprng                              
## saisonSummr                              
## saisonAutmn                              
## trtmntSnsimpct:ssnSp                     
## trtmntSnsimpct:ssnSm                     
## trtmntSim:A           0.498

The top of the results window shows the model used to calculate the effects.

## [[1]]
## [1] "Generalized linear mixed model fit by maximum likelihood (Laplace Approximation)"
## 
## [[2]]
## [1] "gaussian" "identity"
## 
## [[3]]
## glmer(formula = log(Abun) ~ traitement * saison + (1 | campagne) + 
##     (1 | station), data = dataset, family = gaussian(link = identity))

The displayed elements are:
[[1]] the model type with the calculation method used,
[[2]] the probability distribution and its link function,
[[3]] the complete command line, which allows verifying that the model formulation corresponds to the user’s expectations.

Next, it is possible to read scores related to the model’s likelihood compared to the data and the parameters selected for model construction.

Given an observed sample (x1,…,xn) and a probability distribution Pθ, the likelihood quantifies the probability that the observations actually come from a (theoretical) sample of the distribution Pθ. The likelihood associated with the probability distribution Pθ is called the function L such that: \[\displaystyle L(x_1,\ldots,x_n,\theta) = \prod_{i=1}^n P_\theta(x_i)\; \]

##       AIC       BIC    logLik  deviance  df.resid 
## 1676.0889 1726.4454 -827.0445 1654.0889  708.0000


AIC (Akaike Information Criterion): AIC is a model selection criterion that takes into account both the model’s fit quality and its complexity. It favors models that fit the data well while being simple. A model with a lower AIC is considered preferable. However, AIC does not provide an indication of the absolute model fit but only its relative fit compared to other candidate models.
BIC (Bayesian Information Criterion): BIC is another model selection criterion that, like AIC, considers both the fit and complexity of the model. However, BIC penalizes the model’s complexity more severely than AIC. A model with a lower BIC is considered preferable.
logLik (Log-Likelihood): Log-likelihood is a measure of the model’s fit to the data. It represents the probability that the observed data are generated by the fitted model. The higher the log-likelihood, the better the model’s fit to the data. Log-likelihood is almost always used for practical reasons and because it is used for calculating the deviance.
Deviance: Deviance is a measure of the model’s fit compared to a reference model, often a null model. It is calculated as the difference between the deviance of the fitted model and that of the reference model. A lower deviance indicates a better fit of the model to the data. It can be calculated using likelihood (denoted as L) according to the formula: \[Deviance = 2.log\frac{L_\text{modèle saturé}}{L_\text{modèle}}=2.(log(L_\text{modèle saturé})-log(L_\text{modèle}))\]
The residual degrees of freedom represent the number of independent data points remaining once the model has been fitted. They are used to calculate test statistics and associated p-values.

The “scaled residuals” are the standardized residuals of the model. Tests are performed on them to check for the model’s good convergence and fit.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -3.192524 -0.611129  0.053534  0.000016  0.589887  3.532013

The table of random effects is specific to GLMMs. It provides information about this part of the formula: (1 | campaign) + (1 | station). As indicated above, there is no estimation of parameters associated with these factors, but an evaluation of the standard deviation generated by these factors on the intercept value.

##  Groups   Name        Std.Dev.
##  campagne (Intercept) 0.10390 
##  station  (Intercept) 0.07042 
##  Residual             0.75714

Finally, there is the section on fixed effects. This part allows for a diagnosis of the factors and the study variable. The table of fixed effects provides key information on the estimated effects of the predictor variables, their precision, and importance, helping to understand the relationships between variables and draw conclusions from the data.

Estimation (Estimate): This column indicates the estimated coefficients (or effects) of each predictor variable in the model. The estimated effect of the “Intercept” represents the estimated average value of the response variable when all other predictor variables are zero.
Standard Error (Std. Error): This column indicates the standard errors associated with each coefficient estimate. Standard errors measure the variability of the estimate. Lower standard errors indicate more precise estimates.
t Value: This column indicates the t-statistic for testing the null hypothesis that the coefficient is equal to zero. It is calculated by dividing the estimate by its standard error. Higher absolute t-values indicate stronger evidence against the null hypothesis.
Pr(>|z|): This column indicates the p-value associated with the t-statistic for each coefficient. It indicates the probability of observing the data if the null hypothesis (no effect) were true. Lower p-values suggest stronger evidence against the null hypothesis and indicate that the coefficient is statistically significant.

##                                       Estimate Std. Error     t value
## (Intercept)                        10.68014605  0.1046130 102.0919158
## traitementSans impact               0.11828840  0.1231490   0.9605307
## saisonSpring                        0.01607218  0.1425775   0.1127259
## saisonSummer                       -0.15547873  0.1426408  -1.0900021
## saisonAutumn                       -0.04807517  0.1426082  -0.3371136
## traitementSans impact:saisonSpring  0.46810522  0.1679592   2.7870176
## traitementSans impact:saisonSummer  1.05790539  0.1681487   6.2914879
## traitementSans impact:saisonAutumn  0.44990449  0.1680289   2.6775420
##                                        Pr(>|z|)
## (Intercept)                        0.000000e+00
## traitementSans impact              3.367882e-01
## saisonSpring                       9.102479e-01
## saisonSummer                       2.757122e-01
## saisonAutumn                       7.360312e-01
## traitementSans impact:saisonSpring 5.319559e-03
## traitementSans impact:saisonSummer 3.144374e-10
## traitementSans impact:saisonAutumn 7.416455e-03

The “Estimate” column allows determining the average value taken by the studied variable (in our example, total abundance) for each modality of the explanatory variables. In our example, abundance is explained by the variable “treatment” with 2 modalities (impact, no impact) and the variable “season” with 4 modalities (winter, spring, summer, autumn). The (Intercept) row corresponds to a baseline value. This baseline value is associated with one modality of each of our variables. The “Pr(>|z|)” column helps evaluate if the change is significant by checking if the value is less than 0.05. In this example, it is the interaction between season and treatment that brings about significantly different changes: If these two variables were considered separately, their effect on abundance would not be visible.
Thus, here:

  • In winter and in an impacted zone, the logarithm of abundance (since a Lognormal distribution is used) is on average 10.68;
  • to know the value of the logarithm of abundance in winter and without impact, you need to add the Intercept value and the estimate value of the “treatmentNo impact” row, which is 10.68 + 0.12, equaling 10.8;
  • to get the value of the logarithm of abundance in summer and with impact, you add the Intercept value to the estimate value of the “seasonSummer” row, which is 10.68 + (-0.16), equaling 10.52;
  • Finally, by adding the Intercept value with those of the rows “treatmentNo impact”, “seasonSummer”, “treatmentNo impact:seasonSummer”, you get the logarithm of abundance in summer and without impact, which is 10.68 + 0.12 + (-0.16) + 1.06, equaling 11.70.

The last section shows the correlations between the fixed-effect terms of the model. Each row and each column represent a fixed-effect term, and the table values are the correlations between these terms. These correlations are calculated based on the covariance matrix of the fixed-effect estimates. They indicate how the estimated effects of the different fixed factors in the model are related to each other.

Post-hoc analysis of the GLMM model on abundance.

After obtaining the modeling results, the application offers a series of graphs to diagnose the quality of the model fit, focusing on residual analysis. These graphs are generated using the DHARMa package (“Residual Diagnostics for Hierarchical (Multi-level/Mixed) Regression Models”) in R.

The left graph, called the “QQ plot residual” or quantile-quantile plot of the residuals, represents the expected residuals versus the actual observations. In this plot, each point represents a residual calculated by the model for a given observation. Ideally, these points should closely follow a red diagonal line, indicating that the residuals are normally distributed. Significant deviations from this red line suggest a poor model fit to the observed data.

In addition to visualizing the residuals, the DHARMa tool offers three tests to assess the model fit quality:

Kolmogorov-Smirnov Test: This hypothesis test evaluates whether the sample of residuals follows a known distribution, determined by its continuous distribution function. Significant deviations of the residuals from this expected distribution may indicate a poor model fit to the data.

Dispersion Test: This test compares the observed standard deviation of the residuals to that expected based on the data simulation. A significant difference may suggest under- or over-dispersion of the residuals compared to the model’s expectations.

Outlier Test: This test checks if the number of observations with residuals outside the simulation envelope matches the model’s expectations. A significant deviation in this number can indicate the presence of outliers or a poor model fit.

Each test provides a measure of the deviation from the model expectations with an associated “p-value.” A low “p-value” (< 0.05) generally indicates a significant deviation from the model expectations, while a high “p-value” suggests that the observed deviation could be due to chance and is not statistically significant. If this deviation is significant, it is highlighted in red, indicating that the corresponding test does not meet the model’s expectations. These diagnostics help identify mismatches between the model and the observed data and guide necessary adjustments for a better-fitting model.

On the right graph, tests are performed on the uniformity and homogeneity of variance among the groups evaluated in the model. The “within-group deviation from uniformity” test is a boxplot representing the distribution of residual deviations within each group defined by the qualitative factors of your model. Each group is represented by a box, where the median is indicated by a line inside the box, the first and third quartiles are represented by the lower and upper edges of the box, and the whiskers extend to the maximum and minimum values. Points beyond this limit are considered outliers. The goal of this test is to identify groups for which the residuals show significant deviations from a uniform distribution (these appear in red). Significant deviations can indicate a poor model fit for specific groups.

The second test corresponds to a Levene’s Test. Levene’s Test is used to assess whether the variances of the residuals differ significantly between the groups defined by the qualitative factors. It tests the null hypothesis that the variances are equal across all groups. A low “p-value” (generally < 0.05) indicates a significant difference in the variances of the residuals between the groups, suggesting that the homogeneity of variances assumption is not met.

In the case of the example, the Kolmogorov-Smirnov, Outlier, and Dispersion tests are not significant, so there is no problem. If any of these tests were displayed in red, it would indicate that the model is not optimal, and it would be possible to look for another model that would fit better. Since these models are based on real data, it is sometimes impossible to find a perfect model. In that case, the model with the fewest warnings should be chosen. We can also see that the uniformity test is validated, but not the homogeneity test. Once the model is validated, you can change tabs and move on to visualizing the effects associated with the model. If we add the “year” covariate to our GLMM model, representing 96 levels, the graph comparing the groups is replaced by a graph comparing the global predictions with the model residuals.

In the particular case where there are too many different levels due to multiple covariates (or the explanatory variables are quantitative), the part on uniformity and homogeneity of groups is replaced by a representation of the model residuals compared to the model predictions. If there are no issues, the phrase “No significant problems detected” is displayed at the top of the graph. If there are significant deviations of the residuals from a uniform distribution across different quantiles, or if the observed quantile deviations are statistically significant, the associated tests appear in red, suggesting a model inadequacy for certain aspects of the data. Finally, outliers from the simulation (data points that fall outside the range of simulated values) are highlighted with red stars. These points should be interpreted with caution, as we do not know “how much” these values deviate from the model’s expectations. The important thing is to ensure that the verification tests are not significant.

This revised output includes detailed steps and checks to ensure the model fits the data well. By using residual analysis and multiple diagnostic tests, GranulatShiny provides a thorough evaluation of the model, allowing users to identify and address potential issues before proceeding with the analysis of effects and visualizations.

In some cases, the GLMM modeling does not converge. This means that the available data does not allow the calculation algorithm, associated with the model formulation decided by the user, to estimate parameter values. In such cases, an error message appears: “There is an error during modeling. Please change the distribution or the model.” It is also possible that the model produces results, but the post-hoc residual analysis is not satisfactory.
Example:

When GLMM modeling does not converge, it is preferable to opt for modeling methods associated with simpler calculation algorithms, i.e., with fewer parameters to estimate. GranulatShiny provides two such alternatives: GLM and PERMANOVA.
NOTE: When the dataset used for modeling consists of 30 or fewer non-zero observations, it is preferable to stick to the least computationally expensive method, namely PERMANOVA.

    Generalized linear model

Reproduction of the GLM model output from R on abundance

## 
## Call:
## glm(formula = log(Abun) ~ traitement * saison, family = gaussian(link = identity), 
##     data = dataset)
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        10.68271    0.09201 116.106  < 2e-16 ***
## traitementSans impact               0.11410    0.11770   0.969  0.33266    
## saisonSpring                        0.01560    0.13012   0.120  0.90458    
## saisonSummer                       -0.14765    0.13012  -1.135  0.25687    
## saisonAutumn                       -0.05388    0.13012  -0.414  0.67896    
## traitementSans impact:saisonSpring  0.46888    0.16645   2.817  0.00498 ** 
## traitementSans impact:saisonSummer  1.04555    0.16660   6.276 6.04e-10 ***
## traitementSans impact:saisonAutumn  0.45938    0.16645   2.760  0.00593 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.5925863)
## 
##     Null deviance: 529.85  on 718  degrees of freedom
## Residual deviance: 421.33  on 711  degrees of freedom
## AIC: 1674.2
## 
## Number of Fisher Scoring iterations: 2

The GLM yields results similar to those of the GLMM and is analyzed in the same way. However, it does not take into account the random effects induced by the environment or the method used and will therefore be less precise.

    Analysis of variance by permutation

PERMANOVA method integrated into the application

In the case of GranulatShiny, PERMANOVA is applied to a column matrix that mostly groups abundances or biomasses. When it concerns a variable derived from floristic or faunistic surveys, a zero implies the absence of the considered species. While the co-presence of a species can indicate similar conditions in terms of ecological niches and the presence/absence can contrast two niches, the double absence (double 0) can be due to different reasons (rare species not collected, sampling outside ecological niche for both species, etc.). In the latter case, this double 0 is therefore typically not taken into account to estimate the similarity between two elements, especially in the context of biological community characterization. Thus, the data on which PERMANOVA will be applied are quantitative and double zeros are not considered, so the method that seems most appropriate is that of the Bray-Curtis coefficient.

The Bray-Curtis dissimilarity index is used in ecology and biology to assess the dissimilarity between two given samples in terms of the abundance of taxa present in each of these samples. It ranges from 0 (the two samples have the same composition) to 1 (the samples are completely dissimilar). Bray-Curtis dissimilarity is often used in the literature. Its distribution is asymmetric and semi-metric. It is calculated as follows: \[ d_{jk}=\frac{\sum_{i} |x_{ij}-x_{ik}|}{\sum_{i} (x_{ij}+x_{ik})} \] where i = column; j,k = compared rows; x = abundance values

In the case where the input matrix has only one column, i.e., a single species, and zeros exist in the input values, it sometimes happens that the denominator equals zero, which creates an error in the distance matrix. This issue arises for all indicators typically used for abundance data that weight their distance based on the total abundance in the compared sites, such as the Bray-Curtis dissimilarity. Since this method cannot be used in our case, another distance calculation coefficient was sought. This coefficient also needed to disregard double zeros in quantitative data.

The chosen method is the Chi-square (χ²) metric. Its use is recommended when rare species are good indicators of specific ecological conditions. To apply this method, the data must first be standardized using the Chi-square method as follows: \[ x'_{ij}=\frac{x_{ij}}{\sum x_{j} * \sqrt \sum x_{i}} \] where i = column; j = row; x = abundance values

We then calculate the distance matrix by calculating the Euclidean distance on the standardised data matrix. \[d_{jk}=\sqrt \sum_{i} (x_{ij}-x_{ik})² \] where i = column; j,k = compared rows; x = standardised abundance values

The downside of this method, where the effects are calculated on transformed data, is that it is not possible to directly quantify the impact of an effect on the initial variable. Therefore, we cannot say whether the effect of an explanatory variable is more or less important on the original data, as it applies to the transformed data. However, if an effect is considered significant on the transformed data, it is also significant on the original data.

Using PERMANOVA

When evaluating the impact of extraction on a rarely sampled species, it is not possible to use a GLMM or GLM. These models are sensitive to a high presence of zeros in the data and cannot converge in such cases. In the application, when this scenario arises, it is recommended to use PERMANOVA. However, as shown above, for an analysis variable with many zeros, PERMANOVA must be used with caution and the distance calculation metric must be chosen precisely. These challenging datasets are referred to as “zero-inflated” data, and specific models exist for their analysis. The most commonly used model is the “delta” model. This model analyzes the data in two stages: first, by performing a presence/absence analysis, followed by an analysis on the quantitative values associated with the presence. Currently, this type of model is not implemented in GranulatShiny.

Reproduction of PERMANOVA output R on abundance

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dist ~ traitement * saison, data = dataset, permutations = 999)
##                    Df SumOfSqs      R2      F Pr(>F)    
## traitement          1 0.013749 0.08340 72.435  0.001 ***
## saison              3 0.008051 0.04883 14.138  0.001 ***
## traitement:saison   3 0.008098 0.04912 14.220  0.001 ***
## Residual          711 0.134958 0.81864                  
## Total             718 0.164856 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the PERMANOVA output table, we find the number of permutations and the model formula. Following that, several indicators associated with each explanatory covariate are provided:

Df (degrees of freedom): This column indicates the degrees of freedom associated with each term in the model.
SumOfSqs (Sum of Squares): This column shows the sum of the squared distances between the observations in the multivariate space.
R2 (R-squared): This column indicates the proportion of variance explained by each term in the model. For example, for “treatment,” 8.34% of the variation in the data can be explained by the treatment factor.
F (F statistic): This column presents the F statistic for each term, which tests if the variation explained by that term is significantly greater than what would be expected by chance. Higher F values indicate stronger evidence for rejecting the null hypothesis of no effect.
Pr(>F) (p-value): This column shows the p-value associated with the F statistic for each term. It indicates the probability of observing the data if the null hypothesis of no effect (i.e., all group means are equal) were true. Lower p-values suggest stronger evidence against the null hypothesis and indicate that the term is a significant predictor of variation.

Complementary a posteriori analysis of PERMANOVA on abundance.


The same boxplots as in the “Data Representation” tab are displayed. If the comparison variable (in this example, the treatment) has a significant effect on the explained variable (here, abundance), then the p-value appears in red at the top left of the graph. If no effect is detected during the PERMANOVA, the message “No effect” appears at the top left of the graph.

Effects representation

This tab allows for graphical visualization of the effects of explanatory variables on the explained variable in the case of a GLMM or GLM. In the case of a PERMANOVA, this section is not used, and the graphical window will be blank. This part retransforms the estimated parameters of the model back to the original unit (in the case of abundance, it is the number of individuals per km²). This way, you can visualize the mean value of abundance according to season and treatment. First, you need to select the two predictors to be represented.

If you have multiple covariates, you need to fix them to visualize the graph.
In the example of a GLM that examines total abundance according to treatment and season, here is the obtained graph:


This graph is another way of representing the output table of the model in the previous tab.

Statistical power

This part is currently under development. A first version of this element was proposed in an earlier version. As the tool cannot be generalised, it has been withdrawn to ensure the current stability of the application.

Bibliography

Anderson MJ (2017) Permutational Multivariate Analysis of Variance (PERMANOVA). Wiley StatsRef: Statistics Reference Online. John Wiley & Sons, Ltd, pp 1–15

Avezard C, Lavarde P, Pichon A, Legait B, Wallard I (2017) Impact environnemental et ́economique des activit ́es d’exploration ou d’exploitation des ressources minérales marines.

Bolker BM (2008) Ecological Models and Data in R. doi: 10.2307/j.ctvcm4g37

Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White J-SS (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology & Evolution 24: 127–135

Colwell R (2009) Biodiversity: concepts, patterns, and measurement. The Princeton Guide to Ecology. pp 257–263

David V (2019) Statistique pour les sciences environnementales. ISTE Editions, Londres, Royaume-Uni

Gorodetska N, Behaghel G, Dalifard T, Daniel F, Grison X, Hausermann B, Laurent C, De Lantivy S, Lefebvre E, Panonacle H, et al (2023) L’ ́economie bleue en France.

Gregorius H-R, Gillet EM (2008) Generalized Simpson-diversity. Ecological Modelling 211: 90–96

Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280

Methratta ET (2020) Monitoring fisheries resources at offshore wind farms: BACI vs. BAG designs. ICES Journal of Marine Science 77: 890–900

Ministère de l’Environnement de l’́energie et de la mer (2016) Guide méthodologique pour l’élaboration des documents d’orientations pour une gestion durable des granulats marins (DOGGM). Ministère de l’Environnement, de l’Energie et de la Mer. Paris

MTE, UNPG, IFREMER, DREAL, DIRM (2023) Guide technique pour l’élaboration des ́etudes d’impact préalables à la recherche et l’exploitation des granulats marins. 48

Oksanen J (2022) Dissimilarity Indices for Community Ecologists.

Ortiz-Burgos S (2016) Shannon-Weaver Diversity Index. In MJ Kennish, ed, Encyclopedia of Estuaries. Springer Netherlands, Dordrecht, pp 572–573

Parent S-E (2020) Analyse et modélisation d’agroécosystèmes.

Rassweiler A, Okamoto DK, Reed DC, Kushner DJ, Schroeder DM, Lafferty KD (2021) Improving the ability of a BACI design to detect impacts within a kelp-forest community. Ecological Applications 31: e02304

Seger KD, Sousa-Lima R, Schmitter-Soto JJ, Urban ER (2021) Editorial: Before-After Control-Impact (BACI) Studies in the Ocean. Frontiers in Marine Science 8:

Shannon CE (1948) A mathematical theory of communication. The Bell System Technical Journal 27: 379–423

Smokorowski KE, Randall RG (2017) Cautions on using the Before-After-Control-Impact design in environmental effects monitoring programs. FACETS 2: 212–232

Underwood AJ (1994) On Beyond BACI: Sampling Designs that Might Reliably Detect Environmental Disturbances. Ecological Applications 4: 4–15

Walker R, Bokuniewicz H, Carlin D, Cato I, Dijkshoorn C, Backer AD, Dalfsen J van, Desprez M, Howe L, Robertsdottir BG, et al (2016) Effects of extraction of marine sediments on the marine environment 2005-2011. doi: 10.17895/ices.pub.5498

WGEXT (2019) Working Group on the Effects of Extraction of Marine Sediments on the Marine Ecosystem (WGEXT). doi: 10.17895/ices.pub.5